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Abstract 


This  report  presents  the  use  of  the  smooth  ROC  method  for  evaluation  of  biometric  systems,  which 
can  be  used  in  developing  decision  making  rules  for  face  recognition  triaging.  From  a  performance 
evaluation  perspective,  the  problem  can  be  decomposed  into  two  subproblems:  the  first  deals  with 
detecting  the  presence  or  absence  of  a  specified  person  of  interest  (POI)  in  a  given  video  frame, 
and  the  second  measures  the  strength  of  matching  between  the  POI  image  and  the  face  found  in 
the  video  frame.  The  former  can  be  viewed  as  a  binary  decision  of  detecting  (or  the  lack  there 
of),  and  the  latter  is  based  on  a  score  that  depicts  the  strength  of  matching.  Ideally,  the  objective 
of  the  system  is  to  measure  the  agreement  between  higher  matching  scores  with  the  presence  of 
POI  in  the  video  frame  based  on  the  assumption  that  a  higher  matching  score  corresponds  to  a 
higher  likelihood  of  the  POI  being  present  in  the  frame.  The  accumulation  of  this  performance 
information  across  the  stream  of  video  frames  will  yield  information  required  for  performance 
assessment  analysis  of  the  system  as  a  whole. 

Keywords:  video-surveillance,  face  recognition  in  video,  instant  face  recognition,  watch-list 
screening,  biometrics,  reliability,  performance  evaluation 
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1  Problem  definition 


Given  a  stream  of  frames  F  =  {Ft}  obtained  from  a  camera  source  and  given  a  list  of  images  of 
particular  persons  of  interest  (POIs),  call  it  list  L  =  {Li},  the  objective  of  the  system  is  to  detect  the 
presence  of  the  target  image  L*  in  F.  First,  let  us  reduce  the  problem  to  a  single  frame  F,  extracted 
from  the  stream  F.  If  a  face  is  detected  in  frame  Ft,  the  facial  recognition  system  computes  a 
matching  score  5,  for  each  image  of  interest  in  L,  against  every  video  frame  Ft.  This  means  that  a 
given  face  in  Ft  may  be  matched  to  several  images  of  interest  of  L,  potentially  generating  multiple 
hits  most  of  which  are  likely  to  be  false  positives  -  only  one  image  of  interest  is  an  exact  match  to 
the  face  in  frame  Ft.  Therefore,  it  is  crucial  that  the  system  produces  5,  scores  whose  magnitudes 
reflect  the  strength  or  quality  of  the  matching.  Using  the  magnitudes  of  these  matching  scores,  the 
system  will  be  able  to  prioritize  the  strongest  matches  to  decide  an  appropriate  course  of  action. 
The  objective  of  the  evaluation  is  to  assess  how  the  magnitudes  of  these  Si  scores  produces  the 
desired  hits  in  the  video  stream. 


2  Performance  analysis  method 

The  evaluation  of  the  overall  performance  of  the  system  becomes  intuitive  with  the  use  of  the 
smooth  ROC  method  [1],  Ideally,  the  system  should  detect  all  instances  of  strong  matches  whilst 
raising  the  least  number  of  false  alarms.  However,  the  performance  of  this  system  depends  on 
the  ability  of  the  matching  scores  5,-  to  capture  the  desired  matching  based  on  their  magnitudes. 
Therefore,  the  use  of  the  smROC  performance  metric  is  advantageous  due  to  its  ability  to  measure 
the  agreement  between  the  magnitude  of  continuous  value  scores  5,  and  a  binary  decision,  the 
latter  can  represent  the  decision  of  whether  the  recognized  face  is  of  interest  or  not,  and  the  former 
is  the  score  Si  of  matching  a  face  in  frame  F,  to  the  image  of  interest  L,.  The  smROC  method 
plots  individual  instance  of  matching  L,  £  F,  as  line  segments  which  collectively  form  the  smROC 
curve.  The  corresponding  5,  scores  are  used  to  determine  the  slope  of  the  corresponding  line 
segments  so  that  scores  equal  to  1  are  represented  by  vertical  line  segments,  matching  scores  5/  of 
zero  are  plotted  as  horizontal  line  segments,  and  line  segments  of  slopes  between  1  and  0  represent 
Si  scores  between  zero  and  one.  In  other  words,  decreasing  S,-  scores  results  in  line  segments  being 
rotated  clockwise,  from  vertical  to  horizontal  slopes  proportionally  to  the  magnitude  of  5,-. 


3  Interpreting  the  results 

Plotting  the  above  line  segments,  which  correspond  to  individual  instances  of  matchings,  in  a 
decreasing  order  of  their  corresponding  scores  for  a  given  set  of  matchings,  produces  the  smROC 
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curve.  When  the  curve  is  convex  up,  it  means  that  the  strongest  positive  matches  (where  the  POI 
is  indeed  in  the  video  frame)  have  been  assigned  higher  scores,  and  the  weakest  negative  matches 
(where  the  POI  is  not  in  the  video  frame)  are  assigned  low  scores.  Therefore,  the  ideal  performance 
will  have  an  smROC  curve  placed  towards  the  north-west  corner  of  the  plot,  which  also  produces 
the  highest  area  under  the  smROC  curve.  Thus,  calculating  this  area  under  the  curve  can  provide  a 
scalar  (numeric)  summary  of  the  overall  performance  of  the  system  on  the  given  set  of  matchings. 

4  A  sample  analysis 

For  illustration,  consider  the  sample  matching  scores  S)  listed  in  Table  1.  In  this  example,  instances 
of  detecting  any  POI  are  recorded  in  the  Table  1 .  The  top  row  shows  instance  numbers  (a  unique 
identifier),  the  second  row  lists  the  corresponding  labels  indicating  whether  the  matched  image  is 
that  of  a  POI  or  not  (a  label  entry  of  yes/no  corresponds  to  the  ground  truth  of  image  Li  being 
in  frame  Ft ),  and  S,-  is  the  matching  score  between  the  POI  image  and  the  image  in  frame  i  as 
calculated  by  the  system.  For  the  purpose  of  performance  evaluation,  these  instances  of  matchings 
are  sorted  in  a  decreasing  order  of  their  matching  scores  (from  left  to  right  in  Table  1).  The 
corresponding  smROC  curve  is  presented  in  Figure  1.  Plotting  the  smROC  curve  and  calculating 
the  area  under  it  follow  Algorithm  2  published  in  [1], 


Table  1:  Sample  matching  scores  S,  of  POI  images  L,  in  video  frames 


i 

7 

3 

13 

12 

9 

5 

10 

6 

4 

1 

8 

11 

14 

2 

Lie  Ft 

yes 

yes 

yes 

yes 

yes 

yes 

yes 

no 

yes 

no 

no 

yes 

no 

no 

Si 

1 

1 

1 

1 

.96 

.93 

.89 

.66 

.49 

.43 

.30 

.29 

.04 

.01 

When  the  5/  scores  are  assigned  in  a  perfect  agreement  with  the  ground  truth  labels,  the  curve 
is  expected  to  follow  the  blue  line  depicting  a  vertical  rise  followed  by  a  horizontal  run.  In  this 
case,  the  curve  shows  that  the  positive  and  negative  matchings  are  assigned  high  and  low  scores 
respectively  demonstrating  a  correct  performance  of  the  system  because  high  scores  (vertical  line 
segments)  precede  the  low  ones.  In  addition,  the  area  under  the  blue  curve  is  maximal  at  value 
of  1.0  indicating  perfect  ranking  performance.  Alternatively,  when  the  score  magnitudes  fail  to 
depict  the  strength  of  correct  matching,  the  performance  is  said  to  be  random  and  is  expected  to 
follow  the  dotted  black  diagonal  line  in  the  same  figure.  If  we  consider  the  red  curve  in  Figure 
1  plotted  for  those  matchings  listed  in  Table  1,  we  see  that  most  of  the  high  scores  coincide  with 
yes  labels  and  most  of  the  low  scores  are  assigned  to  no  labels.  This  suggests  that  the  sample 
scores  are  reasonably  well  assigned  because  they  are  better  than  random  although  they  do  not 
achieve  the  perfect  rankings  (instances  6  and  1 1  are  incorrectly  placed  in  the  order  of  score  values). 
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Scored  ROC  Curve 


Figure  1 :  The  smROC  (or  Scored  ROC)  curve  for  matchings  listed  in  Table  1 .  The  blue  solid 
line  depicts  the  ideal  (perfect)  performance  and  the  black  dotted  line  represents  the  random  per¬ 
formance.  The  area  under  the  red  smROC  curve  represents  a  scalar  summary  of  the  performance 
of  the  system  on  the  given  set  of  matchings. 

Furthermore,  the  score  values  are  not  all  ones  (for  the  yes  labels)  and  instances  with  no  labels  have 
a  non  zero  score.  To  this  effect,  the  area  under  the  red  curve  fails  to  achieve  the  100%  level 
depicted  by  the  blue  solid  curve,  however,  it  remains  above  the  random  performance  indicated  by 
the  0.5  area  under  the  dotted  black  diagonal  line.  Comparing  the  scalar  values  of  the  areas  under 
the  three  curves  (blue,  red  and  black)  will  produce  the  desired  performance  comparison. 

The  area  under  the  smROC  curve  represents  a  scalar  summary  of  the  performance  of  the  match¬ 
ing  scores  as  assigned  to  matching  instances.  However,  further  examination  of  the  curve  itself  can 
identify  individually  interesting  instances.  For  instance,  the  numbers  shown  next  to  individual  line 
segments  along  the  red  curve  in  the  figure  correspond  to  the  unique  identifiers  of  matching  in¬ 
stances  listed  in  Table  1.  If  we  visually  inspect  individual  line  segments,  we  can  see  that  instances 
7,  3,  13,  and  12  result  in  vertical  line  segments  indicating  a  very  strong  match.  The  next  group  of 
instances  9,  5  and  10  are  more  vertical  than  the  remaining  instances  but  are  not  completely  vertical 
either.  And  finally,  line  segments  4  and  1 1  are  the  most  vertical  among  the  remaining  instances  but 
compared  to  the  previous  two  groups,  they  appear  horizontal.  If  we  examine  the  labels  of  these 
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three  groups  of  instances,  we  can  see  that  they  are  all  labeled  with  yes  indicating  that  the  asso¬ 
ciated  video  frame  matches  the  corresponding  image  of  the  POI  involved.  Therefore,  these  three 
groups  of  instances  represent  positive  matches  detected  by  the  system.  Furthermore,  it  is  clear  that 
there  is  a  significant  change  of  direction  along  the  curve  between  line  segments  10  and  6.  Setting 
the  operational  alarm  threshold  at  that  point  will  result  in  high  number  of  hits  (7  out  of  9  yes  are 
captured)  with  no  false  alarms. 

Effectively,  the  same  approach  can  be  used  for  the  detection  of  true  positive  matchings  among 
the  many  matchings  executed  for  the  stream  of  video  frames.  This  suggests  an  added  benefit  of 
using  the  smROC  to  fine  tune  and  monitor  the  performance  of  the  proposed  system,  it  provides  a 
visual  representation  of  possible  thresholds  which  can  be  used  for  raising  alarms  by  the  system. 
For  instance,  the  alarm  threshold  may  be  set  to  raise  a  red  alarm  for  the  first  two  groups  discussed 
above,  an  amber  alarm  for  the  third  group  and  no  alarm  for  the  remaining  instances  of  matches. 


5  Conclusions 

The  use  of  smROC  curve  to  assess  the  performance  of  the  proposed  facial  detection  system  pro¬ 
vides  several  benefits  particular  to  this  problem.  The  correctness  of  image  detection  and  balancing 
the  trade-off  between  hits  and  alarms  relies  on  how  well  the  matching  scores  are  assigned  to 
matching  instances.  The  magnitudes  of  matchings  are  essential  for  the  prioritization  of  alarms, 
and  they  enable  the  system  to  maximally  capture  positive  detections  while  raising  minimal  false 
alarms.  The  smROC  method  is  the  only  method  reported  in  literature  that  is  able  to  incorporate 
the  magnitudes  of  the  scores  into  the  analysis  of  hits  versus  alarms.  Therefore,  the  possible  use  of 
the  smROC  may  include  measuring  the  detection  performance  in  various  settings,  these  may  be: 

•  Testing  and  validation:  for  a  given  set  of  matchings,  the  area  under  the  smROC  curve  could 
be  used  to  demonstrate  the  ability  of  the  system  to  detect  the  desired  images.  This  process 
should  be  executed  repeatedly  on  various  data  point  sets  in  various  testing  experiments  to 
assess  the  expected  performance  of  the  system.  An  aggregated  report  of  the  accumulated 
results  will  support  a  reliable  conclusion  of  the  expected  overall  performance. 

•  Monitoring  system  operation:  The  calculation  of  the  area  under  the  smROC  curve  pro¬ 
vides  a  simplified  performance  summary  that  can  be  regularly  monitored  over  a  window  of 
time.  This  can  potentially  allow  for  the  detection  of  possible  deterioration  in  system  per¬ 
formance  over  time.  For  instance,  measuring  the  performance  over  a  window  of  2-3  hours 
on  a  daily  basis  can  reveal  deviations  from  the  expected  performance  (as  estimated  during 
testing  and  validation).  This  may  be  important  based  on  the  method  used  to  calculate  the 
matching  scores  because  many  methods  assume  that  the  underlying  characteristics  of  the 
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domain  hardly  change  over  time.  In  real  life,  this  cannot  be  further  from  the  truth.  There¬ 
fore,  it  is  important  to  monitor  the  operational  performance  of  the  matching  model  to  detect 
underlying  distribution  changes. 

•  Determining  alarm  thresholds:  as  discussed  previously,  individual  matching  instances 
are  represented  as  line  segments  whose  slopes  are  directly  associated  with  the  correctness 
and  magnitudes  of  matchings  between  images  and  video  frames.  We’ve  illustrated  how  the 
examination  of  these  slopes  can  help  prioritize  matching  instances  to  determine  the  threshold 
of  alarm  that  achieves  the  desired  rate  of  hits  versus  alarms.  Repeated  thorough  testing  and 
validation  of  the  system  will  reveal  natural  ’’kinks”  in  the  performance  curves  that  can  allow 
the  selection  of  appropriate  alarm  thresholds. 
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•  The  best  way  -  try  it! 

•  Focus  on  existing  techniques  (learning 
algorithms),  and  on  the  necessary  data 
engineering 

•  More  than  20  years 

•  Produced  more  than  100  graduates  in  the 
area  of  data  and  text  mining 

•  Currently  six  profs,  several  PDFs,  25 
graduate  students 

•  Focus  on  applications 

•  Internationally  recognized 

•  Some  experience  with  vision  data 
(RADARSAT,  Landsat  RS  data) 

Task:  face  triaging 

A  preliminary  solution 

•  Given: 

-  List  of  k  POIs  with  their  facial  images 

-  scores  for  each  POI  for  a  stream  of  frames  in  which  a  POI  occurs  or  not: 

-  0000000000003000000000000000000000000000080000000000000000000 

-  0000000000002000000000000000000000000000090000000000000000000 

-  0000002000001000000000000000001000000000090000000000000000000 

-  0000002000001000000000000000001000000000090000000000000000000 

-  0000000000002000000000000000000000000000080000000000000000000 

-  0000002000002000000000000000000000000000090000000000000000000 

-  0000000000002000000000000000000000000000090000000000000000000 

-  0000000000003000000000000000000000000000090000000000000000000 

-  0000000000002000000000000000000000000000090000000000000000000 

-  0000000000003000000000000000000000000000080000020000000000000 

for  each  time  step  t 
for  each  of  the  66  faces: 

-  keep  an  incremental  count  of  scores 

-  smooth  the  count  as  the  t  grows,  eg  sm_countt  =  X sco^(,) 

-  we  test  the  sm_countt  against  thresholds  thrY,  thrR: 
when  at  some  t  it  exceeds  the  thrY  or  thrR  threshold, 
then  we  raise  Y  or  R  alarm,  respectively 

•  Find:  raise  alarm  at  the  moment  when  a  given  POI  is  believed  to  be 
detected  in  the  stream  video 

The  “smart”  part... 


And  also  feature  engineering 


...is  a  good  selection  of  the  thresholds 
thrY,  thrR 


•  What  are  the  good  features? 

•  How  to  bulid  them  -  cooperation  with  CV 


We  propose  to  use  Machine  Learning  for 
this 


•  Noise  in  features 

•  Feature  selection? 


Performance  evaluation 


Q&A  and  discussion... 


•  Not  accuracy 

•  ROC  or  derivative? 

• smROC? 

•  Some  form  of  lift? 

•  Likely  to  be  determined  by  practice 


Smooth  ROC  Motivation 


Applications 


Scores  allow: 

the  classifications  based  on  decisions,  ranking,  and 
scores  (confidence), 

the  visualization  of  score  margins,  and 

The  identification  of  gaps  in  scores. 


Comparing  user  preferences 

Evaluating  relevance  in  search  applications  (facial 
recognition  in  images) 

Magnitude-preserving  ranking  (Cortes  et.  al  ICML'07) 

Research  Tool 

Bioinformatics  (gene  expression) 


Of  course,  probabilities  tell  us  all  this  plus  more 
(theoretical),  but  not  all  scores  are  good  estimates 
of  probabilities! 


Visualizing  The  Scores 


The  Appropriateness  of  scores 


( A  ppropei  mmaa*  of  Soorc#) 

(AMUfMy  of  Approprimta  Scoi^s) 

(A«wr«y  of  Foftppfopiriaio 

Scores 

Predicted 

Predicted 

Label  High 

Low 

Score  Label  Y  N 

Score  Label 

Y  N 

+  yes 

no 

High  "f  in^QrsKt 

High 

incorroct  corroct 

-  no 

yes 

LOW  inrorror.L  corrOGt 

Low  + 

cor  roc  t  incoriftrt 

f  Si  if  Xi  6  {//+  UL  }  (Appropriate  Scores) 

\  1 .  —  Si  if  xi  6  {H~  Ul+}  (Inappropriate  Scores) 


Constructing  The  Smooth  ROC 
Curve 

Mid  —  I(m+  +  — — ) 

2  c 


Appropriate 

Positive  Instance  Negative  Instance 


s, 


Positive  Instance 


Inappropriate 

Negative  Instance 


s, 


smTPR  =  ^  ~  smFPR=-^- 

l«+l  |£~l  |£+l  l«-| 

«»=  2  Si  +  e  Si  +  j](i  -  +  2  u  -  =  2  0(*<) 

i=l  t=l  i=l  i=l  i=l 

l«+l  \L~\  |t+l  |W“I  » 

«*=  2  (*  - $)  +  Ec  -  $)  +  2  Si  +  2 Si  =  £(*  - 

i=l  i=l  i=l  t=I  i=l 


The  Area  Under  smROC  Curves 


smAUC  = 


1 


OCyOCh 


n  n 


0(xi)$(xi,  xj) 

i=  1  j= 1 


'P(xi,Xj) 


'  1  -  &(xi)  for  (Si  >  Sj)  and  (i  /  j) 
<  i(l  -^(:r*))  for  i  =  j 


0  otherwise 

Compare  all  points  pairwise. 

Measure  the  differences  in  classifications 


weighted  by  the  magnify 


